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Abstract 

We discuss the interest of escort distributions and Renyi entropy in the context of source coding. We 
first recall a source coding theorem by Campbell relating a generalized measure of length to the Renyi- 
Tsallis entropy. We show that the associated optimal codes can be obtained using considerations on 
escort-distributions. We propose a new family of measure of length involving escort-distributions and 
we show that these generalized lengths are also bounded below by the Renyi entropy. Furthermore, we 
obtain that the standard Shannon codes lengths are optimum for the new generalized lengths measures, 
whatever the entropic index. Finally, we show that there exists in this setting an interplay between 
standard and escort distributions. 
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1. Introduction 

Renyi and Tsallis entropies extend the standard Shannon-Boltzmann entropy, enabling to build gener- 
alized thermostatistics, that include the standard one as a special case. This has received a very high 
attention and there is a wide variety of applications where experiments, numerical results and analytical 
derivations fairly agree with these new formalisms [1]. These results have also raised interest in the 
general study of information measures and their applications. The definition of Tsallis entropy was orig- 
inally inspired by multifractals whereas the Renyi entropy is an essential ingredient [2, 3], e.g. via the 
definition of the Renyi dimension. For a distribution p of a discrete variable with N possible microstates, 
the Renyi entropy of order a, with a > 0, is defined by 

1 N 

H a (p) = - logVpf. (1) 

1 — a ^-^ 

i=l 

By L' Hospital rule, for a = 1, we recover the Shannon entropy 

N 

H i (p) = - XI Pi log Pi ■ ^ 

i=i 
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The base of the logarithm is arbitrary. In the following, we will denote \og D the base D logarithm. The 
Tsallis entropy is a simple transformation of the Renyi entropy, but is nonextensive. Often associated 
to these entropies, and central in the formulation of nonextensive statistical mechanics is the concept of 
escort distributions: if {pi} is the original distribution, then its escort distribution P is defined by 

z2i=iPi 

The parameter q behaves as a microscope for exploring different regions of the measure p [4]: for q > 1, 
the more singular regions are amplified, while for q < 1 the less singular regions are accentuated. The 
escort distributions have been introduced as a tool in the context of multifractals. Interesting connec- 
tions with the standard thermodynamic are in [4, 5]. Discussion of their geometric properties can also 
be found in [6]. It is also interesting to note that the escort distributions can be found as the result of a 
maximum entropy problem with a constraint on the expected value of a logarithmic quantity, see [2, p. 
53] in the context of multifractals, or [7] for a different view. We shall also point out that the 'deformed' 
information measure like the Renyi entropy (1) and the escort distribution (3) are originally two distincts 
concepts, as indicated here by the different notations a and q. There is a lengthy discussion on this point 
in [8]. 

In the information theory of communication, the entropy is the measure of the quantity of information in 
a message, and a primary aim is to represent the possible messages in an efficient manner, that is to find 
a compact representation of the information according to a measure of 'compactness'. This is the role 
of source coding. In this note, we discuss the interest of escort distributions and alternative entropies in 
this context. This suggests possible connections between coding theory and the measure of complexity 
in nonextensive statistical mechanics. Related works are the study of generalized channel capacities 
[9], the notion of nonadditive information content [10], the presentation of a generalized rate distorsion 
theory [11]. The first section is devoted to a very short presentation of the source coding context, and to 
the presentation of the fundamental Shannon source coding theorem. In section 3, we describe a source 
coding theorem relating a new measure of length and the Renyi entropy. In the next section, we show that 
it is possible to obtain the very same optimum codes, as well as a practical procedure, using a reasoning 
based on the nonextensive generalized mean as the measure of length. In section 5, we introduce another 
measure of length, involving escort distribution, and obtain general inequalities for this measure, where 
the lower bound, once again is a Renyi entropy. We show that the corresponding optimum codes are the 
standard Shannon codes. Finally, in section 6 we discuss the connections between these different results. 



2. Source coding 

In source coding, one considers a set of symbols X = {x±,X2, ■ ■ ■ xn}, and a source that produces 
symbols Xi from X with probabilities p, L where Ym=i Pi = 1- The aim of source coding is to encode the 
source using an alphabet of size D, that is to map each symbol Xi to a codeword c.; of length expressed 
using the D letters of the alphabet. It is known that if the set of lengths U satisfies the Kraft-Mac Millan 
inequality 

N 

Y,D~ k <l, (4) 

i=l 

then there exists a uniquely decodable code with these lengths, which means that any sequence cnCi2 . . .Ci 
can be decoded unambiguously into a sequence of symbols Xi\Xi2 ■ ■ ■ Xi n . Furthermore, any uniquely de- 

2 



codable code satisfies the Kraft-Mac Millan inequality (4). The Shannon source coding theorem (noise- 
less coding theorem) indicates that the expected length of the code Z is bounded below by the entropy of 
the source, H\(p), and that the best uniquely decodable code satisfies 

H 1 (p)<L = J2Pih<H 1 (p) + l, (5) 

i 

where the logarithm in the definition of the Shannon entropy is taken in base D. This result indicates 
that the Shannon entropy Hi (j>) is the fundamental limit on the minimum average length for any code 
constructed for the source. The lengths of the individual codewords, also called 'bit-numbers' [5, p. 46], 
are given by 

k = -\og D pi (6) 

where log D denotes the logarithm in base D. Obviously these code lengths enable to attain the entropy 
in the left of the inequality (5). The characteristic of these optimum codes is that they assign the shorter 
codewords to the most likely symbols and the longer codewords to unlikely symbols. The uniquely 
decodable code can be chosen to have the prefix property, i.e. the property that no codeword is a prefix 
of another codeword. 



3. Source coding with Campbell measure of length 

It is well-known that Huffman coding yields a prefix code which minimizes the expected length and 
approaches the optimum limit 1% = — \og D pi. What is much less well known is that some other forms 
of lengths have been considered [12], the first and definitely fundamental contribution being the paper of 
Campbell [13]. Since the codewords lengths obey to the relation (6), low probabilities yield very long 
words. But the cost of using a word is not necessarily a linear function of its length, and it is possible that 
adding a letter to a long word cost much more than adding a letter to a shorter word. This led Campbell 
to the proposal of a new average length measure, featuring an exponential account of the elementary 
lengths of the codewords. This length, which is called a /3-exponential mean or Campbell length, is a 
Kolmogorov-Nagumo generalized mean associated to an exponential function. It is defined by 

1 - 

Cp = -\og D J2Pi Df * li > C7) 

p i=i 

where j3 is a strictly positive parameter. The remarkable result [13] is that just as Shannon entropy is the 
lower bound on the average codeword length of an uniquely decodable code, the Renyi entropy of order 
q, with q = l/((3 + 1), is the lower bound on the exponentially weighted codeword length (7): 

Cp > H q (p). (8) 

A simple proof of this result will be given below. It is easy to check that the equality is achieved by 
choosing the U such that 

D~ l * =P i = -fL (9) 
Z^j=iPj 

that is 

l i = -qlog D p i + (l-q)H q (p). (10) 

Obviously, the individual lengths obtained this way can be made smaller than the Shannon lengths 
li = —\og D pi, especially for small pi, by selecting a sufficiently small value of q. Hence, the pro- 
cedure effectively penalizes the longer codewords and yields a code different from Shannon's code, with 
possibly shorter codewords associated to the low probabilities. 
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4. Source coding with nonextensive generalized mean 



In the standard measure of average length L = Y^iPdi, we have a linear combination of the individual 
lengths, with the probabilities pi as weights. In order to increase the impact of the longer lengths with 
low probabilities, the Campbell's length uses an exponential of the length. A different approach to the 
problem can be to modify the weigths in the linear combination, so as to raise the importance of the 
terms with low probabilities. A simple way to achieve this is to deform, flatten, the original probability 
distribution and use the new distribution as weights rather than the pj. Of course, a very good candidate 
is the escort distribution, which leads us to the 'average length measure' 

N q N 

4 = 1 ^j P j j = l 

which is nothing but the generalized expected value of nonextensive statistical mechanics according to 
the third mean values' choice of Tsallis, Mendes and Plastino [14]. For the virtual source with distribu- 
tion P, the standard expected length is M g , and the classical Shannon noiseless source coding theorem 
immediately applies, leading to 

M q > ffi(P), (12) 

with equality if 

l t = -log D P (13) 

which is exactly the lengths in (10) obtained via Campbell's measure. This easy result has also be 
mentioned in [10]. 1 

The simple relation k = — log D Pi for the minimization of M q subject to the Kraft-Mac Millan inequal- 
ity has a direct practical implication. Indeed, it suffices to feed a standard coding algorithm, namely a 
Huffman coder, with the escort distribution P instead of the natural distribution p, to obtain as a result a 
code tailored for the Campbell's length measure Cp or equivalently for the length measure M q . A simple 
example, with D = 2, is reported in Table 1 : we used a standard Huffman algorithm with the original 
distribution and the escort distributions with q = 0.7 and q = 0.4. 



Pi 


q = l 


q = 0.7 


q = 0.4 


0.48 








00 


0.3 


10 


10 


01 


0.1 


110 


1100 


100 


0.05 


1110 


1101 


101 


0.05 


11110 


1110 


110 


0.01 


111110 


11110 


1110 


0.01 


nun 


11111 


1111 



Table 1 : Examples of codes in the binary case, for different values of q. 



'in this interesting paper, another inequality is given for the generalized mean: M q > S q (p), where S q is the normalized 
version of Tsallis entropy. In fact, this is only true under the condition JT\ exp q (— U) < 1, with the equality occuring for 
li = — \n q (pi), where exp q and \n q denote the standard nonextensive g-deformed exponential and logarithm. When these 
lengths k also fullfill the Kraft-Mac Millan inequality we have M q = S q (p) > Hi(P). 
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It is worth noting that some specific algorithms have been developed for Campbell's length [15, 16, 12]. 
The remark above gives an easy alternative. An important point is that these new codes have direct 
applications: they are optimum for minimizing the probability of buffer overflows [15], or, with q > 1 
for maximizing the chance of the reception of a message in a single snapshot [17]. In the second case, 
the choice q > 1 increases the main features of the probability distribution, then leading to select more 
short codewords for the highest probabilities; this maximizes the chance of a complete reception of a 
message in a single transmission of limited size. 

5. Another measure of length with Renyi bounds 

Given these results, it is now interesting to introduce a new measure of average length, similar to Camp- 
bell's length but mixing both a an exponential weight of individual lengths l- L and an escort distribution. 
This measure is defined by 



Some specific values are as follows. It is easy to see that Lq = — log D J2i -D + ^°Sd N. When 
q — > +oo, the maximum of the probabilities, say p^ = arg max^pi emerges, and = 1^, where 1^ 
is the length associated to the maximum among the probabilities pi. By L' Hospital's rule, we also 
obtain that L\ = L = YliPih- As for Campbell's measure, it is possible to show that L q is bounded 
below by the Renyi entropy. 

As in Campbell's original proof, let us consider the Holder inequality 

, N v l/p , N v l/ p > N 

i^2\xi\ p j {^2\Vi\ P j < ^2\xiVi\ for all sequences (xt, . . . ,x N ),(yt, . . . .y N ) eR N (15) 

M=l ' M=l ' i=l 

for p or p' in (0, 1) and such that l/p + l/p' = 1. Note that the reverse inequality is true when p and p' 
are in [1, +oo). Suppose that the Zj are the lengths of the codewords in a uniquely decodable code, which 
means that they satisfy the Kraft inequality (4). If we let now x% = pfD~ li and yi = p^ a , it comes 



where the last inequality in the right is the Kraft inequality. 

If we let ap = 1, then a = — 1//3, and —ap' = a/(a — 1) = l/(/3 + 1). Then, (16) reduces to 



Taking the base D logarithm, we obtain the Campbell theorem Cp > H q (p), with q = l/(/3 + 1). 
If we now take ap = q and choose — ap' = 1, we obtain 




(14) 




(16) 




(17) 




(18) 
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where we used of course the fact that the probabilities sum to one. The condition 1/p + 1/p' = 1 easily 
gives p = 1 — q. Dividing the two sides by (X^^i) taking the logarithm and changing the sign 

of the inequality, we finally obtain 

1 / N Q \ i N 

Ai° gD (£r> D(t - 1,, > i^iofoEp" <"» 

which gives the simple inequality 

L q > H q . (20) 

Hence we obtain that the new length measure of order q is lower bounded by the Renyi entropy of the 
same order. Note that this result include Shannon result in the special case q = 1. Interestingly, it is easy 
to check that we have equality in (20) for Zj = — log D pi, which is nothing but the optimal lengths in 
the Shannon coding theorem. Hence, it is remarkable that the whole series of inequalities (20) become 
equalities for the choice k = — log D pi which appears as a kind of universal value in this context. 
This result can draw attention to alternative coding algorithms, based on the minimization of L q , or 
alternative characterizations of the optimal code. For instance, the inequality (20) shows, as a direct 
consequence, that the Shannon code with k = — log D pi minimizes the length of the codeword associated 
to the maximum probability. Indeed, when q —> +oo, — > the length of the codeword of maximum 
probability, and is minimum when 1^ has its minimum value = — \og D p^. 
Since the Renyi and Tsallis entropy are related by a simple monotone transformation, inequalities similar 
to (8) and (20) exist with Tsallis entropy bounds. 



6. Connections between the different length measures 

It is finally useful to exhibit an interplay between the two length measures, their minimizers, and the 
standard and escort distributions. The Campbell measure in (7) involves the distribution p, an exponential 
weight with index /3. The optimum lengths that achieve the equality in the inequality (8) are the bit- 
numbers associated to the escort distribution Zj = — log D Pi. On the other hand, the measure (14) 
involves the escort distribution P instead of p, has an index q and the optimum lengths that achieve the 
equality in the extended source coding inequality (20) are the bit-numbers k = — log D pi associated to 
the original distribution. We know that the transformation q 1/q [14, p. 543] links the original and 
escort distribution, that is the distribution p is the escort distribution with index 1/q of the distribution 
P. This remark enables to find an equivalence between thermostatistics formalisms base on linear and 
generalized averages [18, 19]. Here, when we substitute q by 1/q in (14), and therefore P by p, we end 
with Campbell length (7) where q = l/(/3 + 1). Concerning the entropy bound in (8) and (20), we shall 
also observe that Hi(P) = H q {p), so that we have finally equivalence between the two inequalities (8) 
and (20). This is a new illustration of the duality between standard and escort distributions. 
As a last remark, let us mention that if we apply Jensen inequality to the exponential function in the sum 
defining L q (14), we then obtain M q > L q , where M q is the generalized mean, taken with respect to the 
escort distribution, and we have 

M q > L q > H q . (21) 

The equality in M q > L q means that the transformation in Jensen inequality is a straight line, which 
means q = 1. In such case, we still obtain Mi > H\(jp), which is nothing but the standard Shannon 
coding theorem. 
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7. Conclusions 



In this Letter, we have pointed out the relevance of Renyi entropy and escort distributions in the context 
of source coding. This suggests possible connections between coding theory and the main tools of 
nonextensive statistical mechanics. We have first outlined an overlooked result by Campbell that gave the 
first operational characterization of Renyi entropy, as the lower bound in the minimization of a deformed 
measure of length. We then considered some alternative definitions of measure of length. We showed 
that Campbell's optimum codes can also be obtained using another natural measure of length based on 
escort distributions. Interestingly, this provides an easy practical procedure for the computation of these 
codes. Next, we introduced a third measure of length involving both an exponentiation, as in Campbell's 
case, and escort distributions. We showed that this length is also bounded below by a Renyi entropy. 
Finally, we showed that the duality between standard and escort distributions connects some of these 
results. 

Further work should consider the extension of these results, namely the new lengths definitions, in the 
context of channel coding. With these new lengths, we also intend to investigate the problem of model 
selection, as in Rissanen MDL (Minimum Description Length) procedures. 
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